Goto

Collaborating Authors

 grammatical value


Principles of semantic and functional efficiency in grammatical patterning

Cheng, Emily, Franzon, Francesca

arXiv.org Artificial Intelligence

Grammatical features such as number and gender serve two central functions in human languages. While they encode salient semantic attributes like numerosity and animacy, they also offload sentence processing cost by predictably linking words together via grammatical agreement. Grammars exhibit consistent organizational patterns across diverse languages, invariably rooted in a semantic foundation, a widely confirmed but still theoretically unexplained phenomenon. To explain the basis of universal grammatical patterns, we unify two fundamental properties of grammar, semantic encoding and agreement-based predictability, into a single information-theoretic objective under cognitive constraints. Our analyses reveal that grammatical organization provably inherits from perceptual attributes, but that grammars empirically prioritize functional goals, promoting efficient language processing over semantic encoding.


Computer-Aided Modelling of the Bilingual Word Indices to the Ninth-Century Uchitel'noe evangelie

Ruskov, Martin, Taseva, Lora

arXiv.org Artificial Intelligence

The development of bilingual dictionaries to medieval translations presents diverse difficulties. These result from two types of philological circumstances: a) the asymmetry between the source language and the target language; and b) the varying available sources of both the original and translated texts. In particular, the full critical edition of Tihova of Constantine of Preslav's Uchitel'noe evangelie ('Didactic Gospel') gives a relatively good idea of the Old Church Slavonic translation but not of its Greek source text. This is due to the fact that Cramer's edition of the catenae - used as the parallel text in it - is based on several codices whose text does not fully coincide with the Slavonic. This leads to the addition of the newly-discovered parallels from Byzantine manuscripts and John Chrysostom's homilies. Our approach to these issues is a step-wise process with two main goals: a) to facilitate the philological annotation of input data and b) to consider the manifestations of the mentioned challenges, first, separately in order to simplify their resolution, and, then, in their combination. We demonstrate how we model various types of asymmetric translation correlates and the variability resulting from the pluralism of sources. We also demonstrate how all these constructions are being modelled and processed into the final indices. Our approach is designed with generalisation in mind and is intended to be applicable also for other translations from Greek into Old Church Slavonic.


Improving part-of-speech tagging via multi-task learning and character-level word representations

Anastasyev, Daniil, Gusev, Ilya, Indenbom, Eugene

arXiv.org Machine Learning

In this paper, we explore the ways to improve POS-tagging using various types of auxiliary losses and different word representations. As a baseline, we utilized a BiLSTM tagger, which is able to achieve state-of-the-art results on the sequence labelling tasks. We developed a new method for character-level word representation using feedforward neural network. Such representation gave us better results in terms of speed and performance of the model. We also applied a novel technique of pretraining such word representations with existing word vectors. Finally, we designed a new variant of auxiliary loss for sequence labelling tasks: an additional prediction of the neighbour labels. Such loss forces a model to learn the dependencies in-side a sequence of labels and accelerates the process of training. We test these methods on English and Russian languages.